perm filename ABBIS2[1,RWF] blob sn#728158 filedate 1983-10-26 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00009 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	Programs
C00005 00003	Standard Algorithms in Pascal
C00009 00004	←OUTPUT and output.←
C00011 00005	←Reading Programs←
C00012 00006	←Test Files←
C00013 00007	More Dynamic Programming.
C00018 00008	Assume we are given a road map, like the one below, that shows the distance
C00023 00009	←Sorting/Programming Economics←
C00029 ENDMK
C⊗;
Programs

A ←program← in Pascal is a complete algorithm, describing a computation in
sufficient detail that it can be carried out by a computer.  A Pascal
program contains information about the resources the computer will need to
perform the computation, and detailed instructions about the sequence of
computational steps.  The simplest type of Pascal program has the form
	PROGRAM name (OUPUT);
	BEGIN
	command;
	command;
	...
	...
	command
	END.

where the programmer fills in  the name of his program where `name' appears
in the form, and fills in commands to carry out the desired computation
where `command' appears in the form.

Program names can be made up at will from the English alphabet.  There are
many kinds of commands, but every program will contain some commands to
print its results so that we can see them.  The form of such a command can be
	WRITE (expression, expression, ..., expression)
where the expressions can be numbers, or forumlas built up from numbers
using symbols that express the operations of arithmetic and elementary
functions. 

1.  (Material from my published Pascal Notes goes here)

We need a compact, unambiguous way to express the forms of programs,
commands, etc., that are acceptable in Pascal.  We will use ←transition
diagrams← for these forms.  The transition diagram for programs is:

program:	PROGRAM		name	(OUTPUT);

		BEGIN	command		;
					END.

Any path through the diagram is a legitimate form for a program.  Lower
case names, in rectangular boxes, like "command" stand for sections which
the programmer must fill in.  In the diagram above, every time we come to
"command", we have to fill in a legitimate command according to another
transition diagram:

command:	WRITE	(	expression	,

		WRITELN				)

Standard Algorithms in Pascal

Electronic computers have built-in algorithms for elementary arithmetic.
Usually the built-in algorithms include addition, subtraction,
multiplication, division with optional remainder, and testing whether one
number is greater than, equal to, or less than, another.  Pascal uses these
algorithms.  A Pascal programmer need not concern himself with how division
is done; he writes 22/7 in his program, and the result 3.142857 is
automatically supplied.  Other functions, including the log, trig, and
exponential functions, are not built into the computer, but can be comuted
by small programs that the Pascal translator automatically includes in any
program that needs them. If a Pascal programmer writes the expression
LOG(2) + TAN(3.1415926/4), the translator provides for finding the
logarithm, adding, etc., by a fast and accurate algorithm.  The forms of
expression in Pascal that use the standard numerical algorithms are these,
where A and B are numbers, or expressions with numerical values.
A + B, A - B, A/B, SIN(A), COS(A), ... have their customary meaning.  
A * B is the product of A and B.  Remember not to use A x B; Pascal won't
understand.

SQRT(A)  is the positive square root of A (A>=0)
SQR(A) is A↑2 (Don't confuse SQRT with SQR)
LN(A) is the natural (base = 2.71828...) logarithm.
LOG(A) is the common (base 10) logarithm.
...
...

When several standard algorithms are used in the same expression, as in 
2 + 3/4 * 5 - 6, a set of conventions determine what is done first.
Generally, where possible, multiplications and divisions are done first,
from left to right, giving 3/4 = 0.75 and 0.75 * 5 = 3.75; next additions
and subtractions are done from left to right, giving
2 + 3.75 = 5.75 and 5.75 - 6 = -0.25, the value of the expression.
Parentheses may be used where the convention is not the same as the
programmer's intention;   2 + 3/(4 * 5) - 6 gives
4 * 5 = 20, 3/20 = 0.15, 2 + 0.15 = 2.15, 2.15 - 6 = -3.85.

←OUTPUT and output.←

The results printed by your programs go inot a file called OUTPUT.  this is
only a temporary name for the file, used while your program is being
executed.  When the program is finished, the file is saved in your
directory, under another name.  Reasons for this system are:

(1) You may want to execute the same program several times, and keep all
the output files; they need distinct names.

(2) You may want to execute different programs, all of which call their
result files OUTPUT, and keep all the output files; again, they need
distinct names.

(3) The system for naming files varies from one computer to another.  The
Pascal standard can't accomodate all of them.  At LOTS, for example, a file
name has three parts, separated by periods; the third part is a number.

When you execute a Pascal program at LOTS, after translating the program
into machine language the executive program will ask 
OUTPUT=
and wait for you to type the permanent name of your result file.  I
normally use the program name, followed by a ".OUT".

←Reading Programs←

No textbook can hint at all the methods programmers use in their programs.
We recommend that you read every Pascal program that you can legitimately
lay your hands on.  Sources include other textbooks (watch out, though:
many contain errors, unlike this one), class handouts of this and other
courses, and the LOTS directory <CS10X>.  You might ask more experienced
programmers if you can read some of their programs, but please don't browse
without asking.

←Test Files←

In testing a program which will use a large data file, such as a dictionary
for a spelling checker program, use a much smaller test file, for several
reasons:

(1) Your testing will take less time.

(2) You will be able to predict exactly what the program should do.

(3) You can easily change the file to exercise all parts of the program.
(Long words, words which come after the last dictionary entry,....)

More Dynamic Programming.

I play a simple dice game with an opponent: he rolls two dice, then I roll
two dice, and we alternate until one of us has a total of 25 or more.  He
offers to bet three to one that he will win.  I should accept if my winning
chance is more than 0.25, and decline if less.
(This problem is a simplification of problems arising in backgammon endgames).

I can get a tolerable approximation of my chances by playing the game by
myself a few hundred times, but if the actual chance is within a few
percent of 0.25, I would need to play perhaps a thousand times to avoid a
significant chance of making the wrong decision.  I can program a computer
to make the same experiment, but it is easier, faster, and more reliable to
have it calculate the exact probabilities by another application of the
dynamic programming method.  Let f(A,B) be the chance for the first player
to win when he needs a total of A to win, and the second player needs a
total of B to win.  If A is zero or negative, f(A,B) = 1; if B is zero or
negative, f(A,B) = 0.  Otherwise, the first player rolls two dice, getting
numbers i and j between 1 and 6, and the second player now gets a turn, so
the second player's chance is f(B,A-i-j).  The two player's chances total
1.00, so the first player's chance after rolling the dice is 1-f(B,A-i-j).
The first player's chance before rolling the dice is found by averaging
over all the possible dice rolls; it is 

	1 - 1/36 (sigma) [1<=i<=6, 1<=j<=6] f(B, A-i-j)
			↑

I can calculate f(A,B) for each value of A and B up to 25, and save each in
a table as I go, for use in calculating f for larger values of A and B.  I
must be sure, though, that when I calculate f(A,B) I have already
calculated f(B,A-2) through f(B, A-12);  this can be calculated if I do the
calculation in the order

f(1,1)
f(2,1), f(1,2), f(2,2)
f(3,1), f(3,2), f(1,3), f(2,3), f(3,3)
f(4,1), f(4,2), f(4,3), f(1,4), f(2,4), f(3,4), f(4,4)
etc.

(Another satisfactory order would be by increasing sums of A and B)

The algorithm can be summarized this way:

	Initialize;

	FOR LARGER:= 1 TO 2.5 DO
		BEGIN
		FOR K:=1 TO LARGER-1 DO
			GETF(LARGER, K);
		FOR K:=1 TO LARGER DO
			GETF(K, LARGER)
		END;

	WRITE F[25,25]

where the procedure GETF(A,B) has this body:

	BEGIN
	SUM:=0,
	FOR I:=1 TO 6 DO
		FOR J:=1 TO 6 DO
			SUM:=SUM + F[B,A-I-J];
	F[A,B]:=1.00 -SUM/36.0
	END

Exercise:

Which values of the array F need to be initialized?  What  should the
subscript bounds on F be?

The theme of this example is that the easiest way to compute a function for
one assignment of argument values may be to compute a complete table of the
function. 

Assume we are given a road map, like the one below, that shows the distance
between those pairs of cities that are directly connected by a single
stretch of highway.  How can we produce a table that shows the length of
the shortest route between each pair of cities?











A standard paradigm for approaching problems taht involve a large number of
entities (in this case, cities) is to solve the problem first for small
subsets, then larger and larger ones.  If we had already solved the problem
for six cities, and solved the results in an array, what would we need to
do when a seventh city is introduced?

(1) The distance from the new city to itself is always zero.

(2) The best route from the new city (`A') to one of the earlier cities
(`B') is either a direct road, or consists of going from A to another of
the earlier cities (`C') and taking the best route from C to B, naturally
not passing through A; by trying every possible city in the role of C, we
can find the shortest route from A to B.

(3) The best route from city D to city E is either the previous best route,
or goes from D to the new city A, and from A to E.  In the latter case,
step 3 has already found the best such routes, and we can add their
lengths.

By using the calculations implied by (1), (2), and (3) for each city in
turn, we build up the table until it reflects the entire map.  To present
the algorithm in Pascal, assume we have an array DIRECT, where D[I,J] is a
very large number BIG, like 10000, which is sure to be larger than the
length of the shortest route.

We want to fill in an array DISTANCE, so that DISTANCE[I,J] is the length
of the shortest route from city  I to city J.  The rules above are the
basis of a Pascal algorithm.

	FOR A:=1 TO N DO
		DISTANCE [A,A]:=0;
		FOR B:=1 TO A-1 DO
			DIST :=DIRECT[A,B];
			FOR C:=1 TO A-1 DO
				IF DIST > DIRECT[A,C]+DISTANCE[C,B]
					THEN DIST:=DIRECT[A,C]+DISTANCE[C,B];
		DISTANCE[A,B]:=DIST;DISTANCE[B,A]:=DIST;
		FOR D:=1 TO A-1 DO
			FOR E:=1 TO A-1 DO
				IF DISTANCE[D,A]+DISTANCE[A,E] < DISTANCE[D,E]
				THEN DISTANCE[D,E]:=DISTANCE[D,A]+DISTANCE[A,E]

Exercise:  Modify the above algorithm so that it also computes FIRST[I,J],
the number of the first city on the shortest route from I to J.

Exercise:  After performing the above modification, add a stage to the
algorithm that will read the numbers of two cities and print out the
numbers of all the cities along the shortest route between them.

Here is a simpler Pascal algorithm for the same problem; it is, however,
harder to explain.  See if you can discover why it works.

	FOR I:=1 TO N DO
		FOR K:=1 TO N DO
			DISTANCE[I,K]:=DIRECT[I,K];
	FOR I:=1 TO N DO
		FOR K:=1 TO N DO
			FOR J:=1 TO N DO
				IF DISTANCE[I,K]>DISTANCE[I,J]+DISTANCE[J,K]
					THEN DISTANCE[I,K]:=DISTANCE[I,J]+DISTANCE[J,K]

←Sorting/Programming Economics←

I want to write a program to read N numbers, and put them into array A in
increasing order.  In practice, such a program would more likely be reading
strings and using alphabetical order, but using numbers keeps the algorithm
simple.  If the numbers arrive in the order
	388 909 476 820 245 371
the successive contents of the variables used by the program will be

	I	A[1]	A[2]	A[3]	A[4]	A[5]	A[6]
----------------------------------------------------------------
	0	?	?	?	?	?	?
	1	388	?	?	?	?	?
	2	388	909	?	?	?	?
	3	388	476	909	?	?	?
	4	388	476	820	909	?	?
	5	245	388	476	820	909	?
	6	245	371	388	476	820	909

Each successive arrival is inserted inot the array by moving the previous
numbers to the right, starting with the rightmost ones, until a number is
found which is not larger than the new one.  In Pascal

	FOR I:= 1 TO N DO
		BEGIN
		READ(NEWDATUM);
		J:=I;	(*CANDIDATE LOCATION FOR NEWDATUM*)
		WHILE(NEWDATUM < A[J-1]) AND (J>1) DO
			BEGIN
			A[J]:=A[J-1];
			J:=J-1;
			END;
		A[J]:=NEWDATUM
		END

Exercise:  In the above algorithm, nothing is ever placed in A[0], but that
variable must exist.  Why?

The inner iteration can be simplified slightly by assuming that the number
in a[0] is not greater than the new datum.  In effect, the small number in
A[0] serves as a sentinel to stop decreasing J so that the test for J > 1
is no longer needed.  The above algorithm is preceded by A[0]:=SMALL, where
SMALL is the smallest possible constant, and the inner iterative clause
becomes WHILE NEWDATUM < A[J-1] DO

The sorting method given above is called sorting by insertion.  It is a
very good algorithm if N is no more than 10 or 15.  If a program uses it
many times with a large value of N, it may be worthwhile to use a more
efficient algorithm.  To make the economically correct choice, you need to
calculate the cost of the wasted computer time, and the cost of the
programming time to use a better algorithm.

When N is large, other algorithms can sort many times faster, e.g. more
than ten times faster if N=1000, so almost the entire running time is
wasted.  the insertion sort algorithm, on the average, moves the average
datum half way through a sequence of N/2 previous data, so the number of
times through the inner iteration is about N x 1/2 x N/2 = N↑2/4.  There
are about ten machine operations in the inner iteration, each taking up
about 10↑-6 seconds on my computer, for a total time of 10↑-5 x n↑2/4
seconds.  If computer time costs $0.25 per second, and N is 5000, the cost
is about $15 for each sorting job.  If using a better algorithm requires
two hours work for a programmer paid $25 per hour, the cost of using a
better algorithm should be used if the sorting will be done four times or
more. (Remember, in estimating programmer time, to include debugging and
documentation as well as time to find or design the algorithm)

The typical reader of this book is a student programming problems chosen to
use insignificant amounts of computer time, so the emphasis here is on
simplicity and reliability rather than efficiency.

A professional approach to programming, though, includes awareness of the
tradeoffs between labor and machine costs, and willingness to investigate
the extensive research results on program efficiency.  For our example of
sorting, Knuth's "Searching and Sorting" is a definitive study.  The
program libraries for many computers include excellent sorting programs.